Fast Decision Tree Ensembles for Optical Character Recognition
نویسنده
چکیده
A new boosting algorithm of Freund and Schapire is used to improve the performance of an ensemble of decision trees which are constructed using the information ratio criterion of Quinlan’s C4.5 algorithm. This boosting algorithm iteratively constructs a series of decision trees, each decision tree being trained and pruned on examples that have been filtered by previously trained trees. Examples that have been incorrectly classified by the previous trees in the ensemble are resampled with higher probability to give a new probability distribution for the next tree in the ensemble to train on. By combining the very fast decision tree ensemble with a more accurate (but slower) neural network, we are able to obtain a speed up of a factor of eight over the neural network and yet achieve a much lower error rate than the tree ensemble alone.
منابع مشابه
Advances In Arabic Text Recognition
This paper describes improvements to a system that recognizes Arabic and Farsi text in low− quality, low−resolution, binary document images. Performance advances reflected in the current system largely result from the introduction of ensembles of decision trees as the base recognizer and the development of a new methodology for training tree ensembles that relies on boosting in the sample space...
متن کاملFast Feature Selection in an HMM-Based Multiple Classifier System for Handwriting Recognition
A novel, fast feature selection method for hidden Markov model (HMM) based classifiers is introduced in this paper. It is also shown how this method can be used to create ensembles of classifiers. The proposed methods are tested in the context of a handwritten text recognition task.
متن کاملSpeech Emotion Recognition With TGI+.2 Classifier
We have adapted a classification approach coming from optical character recognition research to the task of speech emotion recognition. The classification approach enjoys the representational power of a syntactic method and efficiency of statistical classification. The syntactic part implements a tree grammar inference algorithm. We have extended this part of the algorithm with various edit cos...
متن کاملCOGNITUS - Fast and Reliable Recognition of Handwritten Forms Based on Vector Quantisation
We report on an eecient intelligent character recognition tool for the automatic treatment of handwritten bank transfer forms. The classiication is based on nearest-neighbor algorithms and a novel binary clustering technique for the generation of large prototype sets. We introduce a new conndence measure which can be used on a decision tree structure to combine lowest error rates with a very hi...
متن کاملA Novel Approach to Recognition of the Isolated Persian Characters using Decision Tree
Optical Character Recognition (OCR) is an area of research that has attracted the interest of researchers for the past forty years. Although the subject has been the center topic for many researchers for years, it remains one of the most challenging and exciting areas in pattern recognition. Because of the cursive nature of Persian language, recognition of its characters is more difficult than ...
متن کامل